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Intrinsic computation refers to how dynamical systems store, structure, and transform histori- 
cal and spatial information. By graphing a measure of structural complexity against a measure of 
randomness, complexity-entropy diagrams display the range and different kinds of intrinsic com- 
putation across an entire class of system. Here, we use complexity-entropy diagrams to analyze 
intrinsic computation in a broad array of deterministic nonlinear and linear stochastic processes, 
including maps of the interval, cellular automata and Ising spin systems in one and two dimensions, 
Markov chains, and probabilistic minimal finite-state machines. Since complexity-entropy diagrams 
are a function only of observed configurations, they can be used to compare systems without refer- 
ence to system coordinates or parameters. It has been known for some time that in special cases 
complexity-entropy diagrams reveal that high degrees of information processing are associated with 
phase transitions in the underlying process space, the so-called "edge of chaos". Generally, though, 
complexity-entropy diagrams differ substantially in character, demonstrating a genuine diversity of 
distinct kinds of intrinsic computation. 
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Discovering organization in the natural world 
is one of science's central goals. Recent innova- 
tions in nonlinear mathematics and physics, in 
concert with analyses of how dynamical systems 
store and process information, has produced a 
growing body of results on quantitative ways to 
measure natural organization. These efforts had 
their origin in earlier investigations of the origins 
of randomness. Eventually, however, it was real- 
ized that measures of randomness do not capture 
the property of organization. This led to the re- 
cent efforts to develop measures that are, on the 
one hand, as generally applicable as the random- 
ness measures but which, on the other, capture 
a system's complexity — its organization, struc- 
ture, memory, regularity, symmetry, and pattern. 
Here — analyzing processes from dynamical sys- 
tems, statistical mechanics, stochastic processes, 
and automata theory — we show that measures of 
structural complexity are a necessary and useful 
complement to describing natural systems only in 
terms of their randomness. The result is a broad 
appreciation of the kinds of information process- 
ing embedded in nonlinear systems. This, in turn, 
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suggests new physical substrates to harness for 
future developments of novel forms of computa- 
tion. 



I. INTRODUCTION 

The past several decades have produced a growing 
body of work on ways to measure the organization of 
natural twstems^fFor earl y w ork, see, e.g., Refs. El 3, Li 

i a a s a a m m mm m eims pi \mjm 

for more recent reviews, see Refs. [U [22l [23|, [2J, [2a, 
US H3, The original interest derived from explo- 

rations, during the 60's to the mid-80's, of behavior gen- 
erated by nonlinear dynamical systems. The thread that 
focused especially on pattern and structural complexity 
originated, in effect, in attempts to reconstruct geome- 
try [29l| to pology [io] , equations of motion [3l[ , periodic 
orbits [32j, and stochastic processes [33[ from observa- 
tions of nonlinear processes. More recently, developing 
and using measures of complexity has been a concern 
of researchers studying neural computation [34], [35| , the 
clinical analysis of patterns from a variety of medical sig- 
nals and imaging technologies 3^_37 L 38L and machine 
learning and synchronization 3^" 40l.l4ll.l42l l43l | , to men- 
tion only a few contemporary applications. 

These efforts, however, have their origin in an ear- 
lier period in which the central concern was not the 
emergence of organization, but rather the origins of ran- 
domness. Specifically, measures were developed and re- 
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fined that quantify the degree of randomness and un- 
predictability generated by dynamical systems. These 
quantities — metric entropy, Lyapunov characteristic ex- 
ponents, fractal dimensions, and so on — now provide an 
often-used and well understood set of tools for detecting 
and quantifying deterministic chaos of various kinds. In 
the arena of stochastic processes, Shannon's entropy rate 
predates even these and has been productively used for 
half a century as a measure of an information source's 
degree of randomness or unpredictability [4J] . 

Over this long early history, researchers came to ap- 
preciate that dynamical systems were capable of an as- 
tonishing array of behaviors that could not be meaning- 
fully summarized by the entropy rate or fractal dimen- 
sion. The reason for this is that, by their definition, these 
measures of randomness do not capture the property of 
organization. This realization led to the considerable 
contemporary efforts just cited to develop measures that 
are as generally applicable as the randomness measures 
but that capture a system's complexity — its organization, 
structure, memory, regularity, symmetry, pattern, and so 
on. 

Complexity measures which do this are often referred 
to as statistical or structural complexities to indicate that 
they capture a property distinct from randomness. In 
contrast, deterministic complexities — such as the Shan- 
non entropy rate, Lyapunov characteristic exponents, 
and the Kolmogorov-Chaitin complexity — are maximized 
for random systems. In essence, they are simply alterna- 
tives to measuring the same property — degrees of ran- 
domness. Here, we shall emphasize complexity of the 
structural and statistical sort which measures a property 
complementary to randomness. We will demonstrate, 
across a broad range of model systems, that measures 
of structural complexity are a necessary and useful addi- 
tion to describing a process in terms of its randomness. 



A. Structural Complexity 

How might one go about developing a structural com- 
plexity measure? A typical starting point is to argue that 
that the structural complexity of a system must reach a 
maximum between the system's perfectly ordered and 
perfectly disordered extremes H, S 1 0, El El EE El] • 
The basic idea behind these claims is that a system 
which is either perfectly predictable (e.g., a periodic se- 
quence) or perfectly unpredictable (e.g., a fair coin toss) 
is deemed to have zero structural complexity. Thus, the 
argument goes, a system with either zero entropy or max- 
imal entropy (usually normalized to one), has zero com- 
plexity; these systems are simple and not highly struc- 
tured. This line of reasoning further posits that in be- 
tween these extremes lies complexity. Those objects that 
we intuitively consider to be complex must involve a con- 
tinuous element of newness or novelty (i.e., entropy), but 
not to such an extent that the novelty becomes com- 
pletely unpredictable and degenerates into mere noise. 



In summary, then, it is common practice to require 
that a structural complexity measure vanish in the per- 
fectly ordered and perfectly disordered limits. Between 
these limits, the complexity is usually assumed to achieve 
a maximum. These requirements are often taken as ax- 
ioms from which one constructs a complexity measure 
that is a single-valued function of randomness as mea- 
sured by, say, entropy. In both technical and popular sci- 
entific literatures, it is not uncommon to find a "complex- 
ity" plotted against entropy in merely schematic form as 
a sketch of a generic complexity function that vanishes 
for extreme values of entropy and achieves a maximum 
in a middle region [1 |47], 0, Esj]. Several authors, in 
fact, have taken these as the only constraints defining 
complexity [H M, H3, M, H ■ 

Here we take a different approach: We do not prescribe 
how complexity depends on entropy. One reason for this 
is that a useful complexity measure needs to do more 
than satisfy the boundary conditions of vanishing in the 
high- and low-entropy limits @, [H . In particular, a 
useful complexity measure should have an unambiguous 
interpretation that accounts in some direct way for how 
correlations are organized in a system. To that end we 
consider a well defined and frequently used complexity 
measures — the excess entropy — and empirically examine 
its relationship to entropy for a variety of systems. 



B. Complexity-Entropy Diagrams 

The diagnostic tool that will be the focal point for 
our studies is the complexity- entropy diagram. Intro- 
duced in Ref. fl4j], a complexity-entropy diagram plots 
structural complexity (vertical axis) versus randomness 
(horizontal axis) for systems in a given model class. 
Complexity-entropy diagrams allow for a direct view of 
the complexity-entropy relationship within and across 
different systems. For example, one can easily read 
whether or not complexity is a single-valued function of 
entropy. 

The complexity and entropy measures that we use cap- 
ture a system's intrinsic computation [191 ]: how a system 
stores, organizes, and transforms information. A crucial 
point is that these measures of intrinsic computation are 
properties of the system's configurations. They do not 
require knowledge of the equations of motion or Hamil- 
tonian or of system parameters (e.g., temperature, dis- 
sipation, or spin-coupling strength) that generated the 
configurations. Hence, in addition to the many cases in 
which they can be calculated analytically, they can be 
inductively calculated from observations of symbolic se- 
quences or configurations. 

Thus, a complexity-entropy diagram measures intrin- 
sic computation in a parameter-free way. This allows 
for the direct comparison of intrinsic computation across 
very different classes since a complexity-entropy dia- 
gram expresses this in terms of common "information- 
processing" coordinates. As such, a complexity-entropy 
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diagram demonstrates how much a given resource (e.g., 
stored information) is required to produce a given 
amount of randomness (entropy), or how much novelty 
(entropy) is needed to produce a certain amount of sta- 
tistical complexity. 

Recently, a form of complexity-entropy diagram has 
been used in the study of anatomical MM brain im- 
ages [H, [57} • This work showed that complexity-entropy 
diagrams give a reliable way to distinguish between 
"normal" brains and those experiencing cortical thin- 
ning, a condition associated with Alzheimer's disease. 
Complexity-entropy diagrams have also recently been 
used as part of a proposed test to distinguish chaos from 
noise [58] . And Ref. [5!| calculates complexity-entropy 
diagrams for a handful of different complexity measures 
using the sequences generated by the symbolic dynamics 
of various chaotic maps. 

Historically, one of the motivations behind complexity- 
entropy diagrams was to explore the common claim that 
complexity achieves a sharp maximum at a well defined 
boundary between the order-disorder extremes. This led, 
for example, to the widely popularized notion of the 
"edge of chaos" E0, E3, M, M, M, M, E3j— namely, 
that objects achieve maximum complexity at a boundary 
between order and disorder. Although these particular 
claims have been criticized [68j . during the same period 
it was shown that at the onset of chaos complexity does 
reach a maximum. Specifically, Ref. [I4j showed that the 
statistical complexity diverges at the accumulation point 
of the period-doubling route to chaos. This led to an 
analytical theory that describes exactly the interdepen- 
dence of complexity and entropy for this universal route 
to chaos [161 ]. Similarl y, a nother complexity measure, the 
excess entropy [J S Bill, HE S3, IM H El has also been 
shown to diverge at the period-doubling critical point. 

This latter work gave some hope that there would be a 
universal relationship between complexity and entropy — 
that some appropriately defined measure of complexity 
plotted against an appropriate entropy would have the 
same functional form for a wide variety of systems. In 
part, the motivation for this was the remarkable suc- 
cess of scaling and data collapse for critical phenomena. 
Data collapse is a phenomena in which certain variables 
for very different systems collapse onto a single curve 
when appropriately rescaled near the critical point of 
a continuous phase transition. For example, the mag- 
netization and susceptibility exhibit data collapse near 
the ferromagnet-paramagnet transition. See, for exam- 
ple, Refs. [72|, [73| for further discussion. Data collapse 
reveals that different systems — e.g., different materials 
with different critical temperatures — possess a deep sim- 
ilarity despite differences in their details. 

The hope, then, was to find a similar universal curve 
for complexity as a function of entropy. One now sees 
that this is not and, fortunately, cannot be the case. 
Notwithstanding special parametrized examples, such as 
period-doubling and other routes to chaos, a wide range 
of complexity-entropy relationships exists [lg, [ljj, [2J, [TJ] . 



This is a point that we will repeatedly reinforce in the 
following. 

C. Surveying Complexity-Entropy Diagrams 

We will present a survey of the relationships between 
structure and randomness for a number of familiar, well 
studied systems including deterministic nonlinear and 
linear stochastic processes and well known models of com- 
putation. The systems we study include maps of the in- 
terval, cellular automata and Ising models in one and 
two dimensions, Markov chains, and minimal finite-state 
machines. To our knowledge, this is the first such cross- 
model survey of complexity-entropy diagrams. 

The main conclusion that emerges from our results 
is that there is a large range of possible complexity- 
entropy behaviors. Specifically, there is not a uni- 
versal complexity-entropy curve, there is not a gen- 
eral complexity-entropy transition, nor is it case that 
complexity-entropy diagrams for different systems are 
even qualitatively similar. These results give a concrete 
picture of the very different types of relationship be- 
tween a system's rate of information production and the 
structural organization which produces that randomness. 
This diversity opens up a number of interesting mathe- 
matical questions, and it appears to suggest a new kind 
of richness in nature's organization of intrinsic computa- 
tion. 

Our exploration of intrinsic computation is struc- 
tured as follows: In Section [II] we briefly review sev- 
eral information-theoretic quantities, most notably the 
entropy rate and the excess entropy. In Section IIIII we 
present results for the complexity-entropy diagrams for a 
wide range of model systems. In Section IIVI we discuss 
our results, make a number of general comments and ob- 
servations, and conclude by summarizing. 

II. ENTROPY AND COMPLEXITY MEASURES 

A. Information-Theoretic Quantities 

The complexity-entropy diagrams we will examine 
make use of two information-theoretic quantities: the ex- 
cess entropy and the entropy rate. In this section we 
fix notation and give a brief but self-contained review of 
them. 

We begin by describing the stochastic process gener- 
ated by a system. Specifically, we are interested here in 
describing the character of bi-infinitc, one-dimensional 

sequences: S= • ■ ■ , S-2, S-i, So, Si, . . ., where the Si's 
are random variables that assume values Sj in a finite al- 
phabet A. Throughout, we follow the standard conven- 
tion that a lower-case letter refers to a particular value of 
the random variable denoted by the corresponding upper- 
case letter. In the following, the index i on the Si's will 
refer to either space or time. 
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A process is, quite simply, the distribution over 

all possible sequences generated by a system: P(S). 
Let P(sf ) denote the probability that a block Sf — 
SiSi+i . . . Si+L-i of L consecutive symbols takes on the 
particular values Sj, Sj+i, . . . , Si+L-i £ A. We will as- 
sume that the distribution over blocks is stationary: 
P(Sf-) = P(S^ +M ) for all i, M, and L. And so we will 
drop the index on the block probabilities. When there is 
no confusion, then, we denote by s L a particular sequence 
of L symbols, and use P(s L ) to denote the probability 
that the particular i-block occurs. 

The support of a process is the set of allowed 
sequences — i.e., those with positive probability. In the 
parlance of computation theory, a process' support is a 
formal language: the set of all finite length words that 
occur at least once in an infinite sequence. 

A special class of processes that we will consider in 
subsequent sections are Order-R Markov Chains. These 
processes are those for which the joint distribution can be 
conditionally factored into words S R of length R — that 
is, 

P(S) = ... P(S R \S R i R )P(S R hR \Sf)P(Sf + 2 R \S R +R ) .... 

(1) 

In other words, knowledge of the current length- R word is 
all that is needed to determine the distribution of future 
symbols. As a result, the states of the Markov chain 
are associated with the A R possible values that can be 
assumed by a length-i? word. 

We now briefly review several central quantities of in- 
formation theory that we will use to develop measures of 
unpredictability and entropy. For details see any text- 
book on information theory; e.g., Ref. 44|. Let X be a 



random variable that assumes the values x 6 X, where X 
is a finite set. The probability that X assumes the value 
x is given by P(a;). Also, let Y be a random variable that 
assumes values y G y. 

The Shannon entropy of the variable X is given by: 



H[X] = -^P(x)log 2 P( 3 



(2) 



xex 



The units are given in bits. This quantity measures 
the uncertainty associated with the random variable X. 
Equivalently, H [X] is also the average amount of memory 
needed to store outcomes of variable X. 

The joint entropy of two random variables, X and Y, 
is defined as: 

H[X,Y]=- ]T P(x,y)\og 2 P(x,y) . (3) 

It is a measure of the uncertainty associated with the 
joint distribution P(X, Y). The conditional entropy is 
defined as: 

h\x\y] = - J2 P(z,y)iog 2 P(z|y), (4) 

and gives the average uncertainty of the conditional prob- 
ability P(X\Y). That is, H[X\Y] tells us how uncertain, 



on average, we are about X, given that the outcome of 
Y is known. 

Finally, the mutual information is defined as: 



I[X;Y] = H[X] - H[X\Y] 



(5) 



It measures the average reduction of uncertainty of one 
variable due to knowledge of another. If knowing Y on 
average reduces uncertainty about X, then it makes sense 
to say that Y carries information about X. Note that 
I[X;Y]=I[Y;X]. 



B. Entropy Growth and Entropy Rate 

With these definitions set, we are ready to develop an 
information-theoretic measure of a process's randomness. 
Our starting point is to consider blocks of consecutive 
variables. The block entropy is the total Shannon entropy 
of length- L sequences: 



H(L) = - £ P( s L )log 2 P( S L ), 



(6) 



where L > 0. The sums run over all possible blocks of 
length L. We define H(0) = 0. The block entropy grows 
monotonically with block length: H(L) > H(L — 1). 

For stationary processes the total Shannon entropy 
typically grows linearly with L. That is, for sufficiently 
large L, H(L) ~ L. This leads one to define the entropy 



rate as: 



lim 



H(L) 



(7) 



The units of h^ are bits per symbol. This limit exists for 
all stationary sequences |4J, Chapter 4.2]. The entropy 
rate is also know as the metric entropy in dynamical 
systems theory and is equivalent to the thermodynamic 
entropy density familiar from equilibrium statistical me- 
chanics. 

The entropy rate can be given an additional interpreta- 
tion as follows. First, we define an L-dependent entropy 
rate estimate: 



fc„(L) - H(L)-H(L-1) 
= H[Sl\Sl-i, Sl-2, ■ 



, Si 



(8) 

L > . (9) 



We set h fl (Q) = log 2 |^4|. In words, then, h^(L) is the 
average uncertainty of the next variable Sl, given that 
the previous L—l symbols have been seen. Geometrically, 
hfj,(L) is the two-point slope of the total entropy growth 
curve H(L). Since conditioning on more variables can 
never increase the entropy, it follows that h^L) < h^L— 
1). In the L — > oo limit, h^(L) is equal to the entropy 
rate defined above in Eq. ([7]): 



= lim hfj_(L) . 

L — >oo 



(10) 
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Again, this limit exists for all stationary processes [44j |. 
Equation (JTUJ) tells us that may be viewed as the ir- 
reducible randomness in a process — the randomness that 
persists even after statistics over longer and longer blocks 
of variables are taken into account. 



C. Excess Entropy 

The entropy rate gives a reliable and well understood 
measure of the randomness or disorder intrinsic to a pro- 
cess. However, as the introduction noted, this tells us lit- 
tle about the underlying system's organization, structure, 
or correlations. Looking at the manner in which h^{L) 
converges to its asymptotic value hp, however, provides 
one measure of these properties. 

When observations only over length- L blocks are taken 
into account, a process appears to have an entropy rate of 
h^{L). This quantity is larger than the true, asymptotic 
value of the entropy rate h^. As a result, the process ap- 



This form of the excess entropy highlights another in- 
terpretation: E is the cost of amnesia. If an observer has 
extracted enough information from a system (at large L) 
to predict it optimally (~ h^), but suddenly loses all of 
that information, the process will then appear more ran- 
dom by an amount H{L) — h^L. 

To close, note that the excess entropy, originally coined 
in goes by a number of different names, including 
"stored information" Q ; "effective measure complexity" 
d S, [H IliM; "complexity" 012!; "predictive in- 
fer mat ion^3tJS0] ! and "reduced Renyi entropy of order 
1" 0,1211 • F° r recent reviews on excess entropy, entropy 
convergence ingeneral, and applications of this approach 
see Refs. SI, [23, [11. 



D. Intrinsic Information Processing Coordinates 

In the model classes examined below, we shall take the 
excess entropy E as our measure of complexity and use 
pears more random by h^(L) — bits. Summing these the entropy rate as the randomness measure. The ex- 



entropy over-estimates over L, one obtains the excess en- 
tropy 



yy over-estm 

iaa0 : 



E 



(ii) 



L=l 



The units of E are bits. The excess entropy tells us how 
much information must be gained before it is possible to 
infer the actual per-symbol randomness h^. It is large 
if the system possesses many regularities or correlations 
that manifest themselves only at large scales. As such, 
the excess entropy can serve as a measure of global struc- 
ture or correlation present in the system. 

This interpretation is strengthened by noting that the 
excess entropy can also be expressed as the mutual in- 
formation between two adjacent semi-infinite blocks of 
variables El El: 



cess entropy E and the entropy rate are exactly the 
two quantities that specify the large-L asymptotic form 
for the block entropy Eq. (JT3J) . The set of all (/i^, E) pairs 
is thus geometrically equivalent to the set of all straight 
lines with non- negative slope and intercept. Clearly, 
a line's slope and intercept are independent quantities. 
Thus, there is no a priori reason to anticipate any rela- 
tionshipbetween and E, a point emphasized early on 
by Li 

It is helpful in the following to know that for binary 
order- R Markov processes there is an upper bound on 
the excess entropy: 



(15) 



E 



lira I[S- L , S-l+i,S-i; S , Si,... S L -i] ■ (12) 



Thus, the excess entropy measures one type of the mem- 
ory of the system; it tells us how much knowledge of 
one half of the system reduces our uncertainty about 
the other half. If the sequence of random variables is 
a time series, then E is the amount of information the 
past shares with the future. 

The excess entropy may also be given a geometric in- 
terpretation. The existence of the entropy rate suggests 
that H{L) grows linearly with L for large L and that the 
growth rate, or slope, is given by h^. It is then possible 
to show that the excess entropy is the "^/-intercept" of 
the asymptotic form for H(L) % @, M, HI : 



We sketch a justification of this result here; for the 
derivation, see [27, Proposition 11]. First, recall that 
the excess entropy may be written as the mutual infor- 
mation between two semi-infinite blocks, as indicated in 
Eq. (fT2"|) . However, given the process is order- R Marko- 
vian, Eq. (fl]) , the excess entropy reduces to the mutual in- 
formation between two adjacent i?-blocks. From Eq. ([5]), 
we see that the excess entropy is the entropy of an R- 
block minus the entropy of an i?-block conditioned on its 
neighboring i?-block: 



E= H{R)-H[S?\S?_ R ]. 



(16) 



H(L) 



E- 



as L 



Or, rearranging, we have 



E = lim [H(L) - h^L] 

L — >oo 



(13) 



(14) 



(Note that this only holds in the special case of order- R 
Markov processes. It is not true in general.) The first 
term on the right hand side of Eq. (|16p is maximized when 
the distribution over the i?-block is uniform, in which 
case H(R) = R. The second term on the right hand side 
is minimized by assuming that the conditional entropy of 
the two blocks is given simply by Rh^ — i.e., R times the 
per-symbol entropy rate h^. In other words, we obtain a 
lower bound by assuming that the process is independent, 
identically distributed over i?-blocks. Combining the two 
bounds gives Eq. (TH 



6 



It is also helpful in the following to know that for 
periodic processes — (perfectly predictable) and 
E = log 2 p, where p is the period [13] ■ In this case, E is 
the amount of information required to distinguish the p 
phases of the cycle. 

E. Calculating Complexities and Entropies 

As is now clear, all quantities of interest depend on 
knowing sequence probabilities P(s L ). These can be ob- 
tained by direct analytical approximation given a model 
or by numerical estimation via simulation. Sometimes, 
in special cases, the complexity and entropy can be cal- 
culated in closed form. 

For some, but not all, of the process classes studied 
in the following, we estimate the various information- 
theoretic quantities by simulation. We generate a long 
sequence, keeping track of the frequency of occurrence of 
words up to some finite length L. The word counts are 
stored in a dynamically generated parse tree, allowing us 
to go out to L — 120 in some cases. We first make a 
rough estimate of the topological entropy using a small 
L value. This entropy determines the sparseness of the 
parse tree, which in turn determines how large a tree can 
be stored in a given amount of memory. From the word 
and subword frequencies P(s L ), one directly calculates 
H(L) and, thus, and E. Estimation errors in these 
quantities are a function of statistical errors in P(s L ). 

Here, we are mainly interested in gaining a general 
sense of the behavior of the entropy rate and the ex- 
cess entropy E. And so, for the purposes of our sur- 
vey, this direct method is sufficient. The vast major- 
ity of our estimates are accurate to at least 1%. If 
extremely accurate estimates are needed, there exist a 
variety of techniques for correcting for estimator bias 
P, M, H H M, HI- When one is working with fi- 
nite data, there is also the question of what errors occur, 
since the L — > oo limit cannot be taken. For more on this 
issue, see Ref. [27| . 

Regardless of these potential subtleties, the entropy 
rate and excess entropy can be reliably estimated via 
simulation, given access to a reasonably large amount of 
data. Moreover, this estimation is purely inductive — one 
does not need to use knowledge of the underlying equa- 
tions of motion or the hidden states that produced the se- 
quence. Nevertheless, for several of the model classes we 
consider — one-dimensional Ising models, Markov chains, 
and topological Markov chains — we calculate the quanti- 
ties using closed-form expressions, leading to essentially 
no error. 



III. COMPLEXITY-ENTROPY DIAGRAMS 

In the following sections we present a survey of intrin- 
sic computation across a wide range of process classes. 
We think of a class of system as given by equations of 



motion, or other specification for a stochastic process, 
that are parametrized in some way — a pair of control pa- 
rameters in a one-dimensional map or the energy of a 
Hamiltonian, say. The space of parameters, then, is the 
concrete representation of the space of possible systems, 
and a class of system is a subset of the set of all pos- 
sible processes. A point in the parameter space is then 
a particular system, whose intrinsic computation we will 
summarize by a pair of numbers — one a measure of ran- 
domness, the other a measure of structure. In several 
cases, these measures are estimated from sequences gen- 
erated by the temporal or spatial process. 



A. One-Dimensional Discrete Iterated Maps 

Here we look at the symbolic dynamics generated by 
two iterated maps of the interval — the well studied logis- 
tic and tent maps — of the form: 

Xn+l = f^(x n ) , (17) 

where \i is a parameter that controls the nonlinear func- 
tion /, x n (E [0,1], and one starts with xq, the initial 
condition. The logistic and tent maps are canonical ex- 
amples of systems exhibiting deterministic chaos. The 
nonlinear iterated function / consists of two monotone 
pieces. And so, one can analyze the maps' behavior on 
the interval via a generating partition that reduces a se- 
quence of continuous states xq, x%, x%, . . . to a binary se- 
quence So, Si, S2, ■ ■ ■ [86|]. The binary partition is given 
by 

( x < \ 
Si = \ . (18) 

[ 1 x>\ 

The binary sequence may be viewed as a code for the set 
of initial conditions that produce the sequence. When 
the maps are chaotic, arbitrarily long binary sequences 
produced using this partition code for arbitrarily small 
intervals of initial conditions on the chaotic attractor. 
Hence, one can explore many of these maps' properties 
via binary sequences. 



1. Logistic Map 

We begin with the logistic map of the unit interval: 

f(x) = rx(l - x) , (19) 

where the control parameter r G [0,4]. We iterate this 
starting with an arbitrary initial condition xq € [0, 1]. In 
Fig. Q] we show numerical estimates of the excess entropy 
E and the entropy rate as a function of r. Notice that 
both E and change in a complicated matter as the 
parameter r is varied continuously. 
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FIG. 1: Excess entropy E and entropy rate as a function 
of the parameter r. The top curve is excess entropy. The r 
values were sampled uniformly as r was varied from 3.4 to 
4.0 in increments of 0.0001. The largest L used was L — 30 
for systems with low entropy. For each parameter value with 
positive entropy, 1 x 10 7 words of length L were sampled. 

As r increases from 3.0 to approximately 3.5926, the 
logistic map undergoes a series of period-doubling bifur- 
cations. For r E (3.0, 3.2361) the sequences generated 
by the logistic map are periodic with period two, for 
r E (3.2361,3.4986) the sequences are period 4, and for 
r E (3.4986, 3.5546) the sequences are period 8. For all 
periodic sequences of period p, the entropy rate is 
zero and the excess entropy E is log 2 p. So, as the pe- 
riod doubles, the excess entropy increases by one bit. 
This can be seen in the staircase on the left hand side of 
Fig. [1] At r w 3.5926, the logistic map becomes chaotic, 
as evidenced by a positive entropy rate. For further dis- 
cussion of the phenomenology of the logistic map, see al- 
most any modern textbook on nonlinear dynamics, e.g., 
Refs. HElH. 

Looking at Fig.[TJ it is difficult to see how E and are 
related. This relationship can be seen much more clearly 
in Fig. [21 in which we show the complexity-entropy dia- 
gram for the same system. That is, we plot (/i M , E) pairs. 
This lets us look at how the excess entropy and the en- 
tropy rate are related, independent of the parameter r. 

Figure [2] shows that there is a definite relationship be- 
tween E and — one that is not immediately evident 
from looking at Fig.[T] Note, however, that this relation- 
ship is not a simple one. In particular, complexity is not 
a function of entropy: E ^ g{h^). For a given value of 
hf,,, multiple excess entropy values E are possible. 

There are several additional empirical observations to 
extract from Fig. O First, the shape appears to be self- 
similar. This is not at all surprising, given that the logis- 
tic map's bifurcation diagram itself is self-similar. Sec- 
ond, note the clumpy, nonuniform clustering of (/i M ,E) 
pairs within the dense region. Third, note that there is 
a fairly well defined lower bound. Fourth, for a given 
value of the entropy rate there are many possible val- 
ues for the excess entropy E. However, it appears as 
if not all E values are possible for a given h^. Lastly, 
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FIG. 2: Entropy rate and excess entropy (/i M , E)-pairs for 
logistic map. Points from regions of the map in which the 
bifurcation diagram has one or two bands are colored differ- 
ently. There are 3214 parameter values sampled for the one- 
band region and 3440 values for the two-band region. The 
r values were sampled uniformly. The one-band region is 
r £ (3.6786,4.0); the two-band region is r E (3.5926, 3.6786). 
The largest L used was L — 30 for systems with low entropy. 
For each parameter value with positive entropy, 1 x 10 7 words 
of length L were sampled. 

note that there does not appear to be any phase tran- 
sition (at finite h^) in the complexity-entropy diagram. 
Strictly speaking, such a transition does occur, but it 
does so at zero entropy rate. As the period doublings ac- 
cumulate, the excess entropy grows without bound. As 
a result, the possible excess entropy values at = 
on the complexity-entropy diag ram are unbounded. For 
further discussion, see Ref. [161 ]. 

2. Tent Map 

We next consider the tent map: 

{ax x < \ 

(20) 
a(l — x) x > 5 

where a E [0,2] is the control parameter. For a E [1,2], 
the entropy rate = log 2 a; when a E [0,1], = 0. 
Fig. [3] shows 1 , 200 (h^ , E)-pairs in which E is calculated 
numerically from empirical estimates of the binary word 
distribution P(s L ). 

Reference [16| developed a phenomenological theory 
that explains the properties of the tent map at the so- 
called band-merging points, where bands of the chaotic at- 
tractor merge pairwise as a function of the control param- 
eter. The behavior at these points is noisy periodic — the 
order of band visitations is periodic, but motion within 
is deterministic chaotic. They occur when a = 2 2 
The symbolic-dynamic process is described by a Markov 
chain consisting of a periodic cycle of 2™ states in which 
all state-to-state transitions are nonbranching except for 
one where Si = or = 1 with equal probability. Thus, 
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FIG. 3: Excess entropy E versus entropy density hp for the 
tent map. The L used to estimate P(s L ), and so E and /i M , 
varied depending on the a parameter. The largest L used 
was L = 120 at low h^. The plot shows 1,200 (hp, E)-pairs. 
The parameter was incremented every Aa = 5 x 10~ 4 for 
a G [1, 1.2] and then incremented every Aa = 0.001 for a S 
[1.2, 2.0]. For each parameter value with positive entropy, 10 7 
words of length L were sampled. 

each phase of the Markov chain has zero entropy per 
transition, except for the one that has a branching en- 
tropy of 1 bit. The entropy rate at band-mergings is 
thus hp — 2~ n , with n an integer. 

The excess entropy for the symbolic-dynamic process 
at the 2™-to-2™ _1 band-merging is simply E = log 2 2™ = 
n. That is, the process carries n bits of phase informa- 
tion. Putting these facts together, then, we have a very 
simple relationship in the complexity-entropy diagram at 
band-mergings : 

E = log 2 h» . (21) 

This is graphed as the dashed line in Fig. [3] It is 
clear that the entire complexity-entropy diagram is much 
richer than this simple expression indicates. Nonetheless, 
Eq. (j2Tj) does capture the overall shape quite well. 

Note that, in sharp contrast to the logistic map, for 
the tent map it does appear as if the excess entropy takes 
on only a single value for each value of the entropy rate 
h^. The reason for this is straightforward. The entropy 
rate is a simple monotonic function of the parame- 
ter a — — log 2 a — and so there is a one-to-one rela- 
tionship between them. As a result, each /i M value on 
the complexity-entropy diagram corresponds to one and 
only one value of a and, in turn, corresponds to one and 
only one value of E. Interestingly, the excess entropy ap- 
pears to be a continuous function of although not a 
differentiable one. 



B. Ising Spin Systems 

We now investigate the complexity-entropy diagrams 
of the Ising model in one and two spatial dimensions. 
Ising models are among the simplest physical models 



of spatially extended systems. Originally introduced to 
model magnetic materials, they are now used to model a 
wide range of cooperative phenomena and order-disorder 
transitions and, more generally, are viewed as generic 
models of spatially extended, statistical mechanical sys- 
tems [89|, [9(| • Like the logistic and tent maps, Ising mod- 
els are also studied as an intrinsically interesting math- 
ematical topic. As we will see, Ising models provide an 
interesting contrast with the intrinsic computation seen 
in the interval maps. 

Specifically, we consider spin-1/2 Ising models with 
nearest (NN) and next-nearest neighbor (NNN) inter- 
actions. The Hamiltonian (energy function) for such a 
system is: 

H — —J\ SiSj 

-J 2 s i s i ~ B J2 Si > ( 22 ) 

where the first (second) sum is understood to run over 
all NN (NNN) pairs of spins. In one dimension, a spin's 
nearest-neighbors will consist of two spins, one to the 
right and one to the left, whereas in two dimensions a 
spin will have four nearest neighbors — left, right, up, and 
down. Each spin Si is a binary variable: Si G { — 1, +1}. 
The coupling constant J\ is a parameter that when pos- 
itive (negative) makes it energetically favorable for NN 
spins to (anti-)align. The constant Ji has the same ef- 
fect on NNN spins. The parameter B may be viewed 
as an external field; its effect is to make it energetically 
favorable for spins to point up (i.e., have a value of +1) 
instead of down. The probability of a configuration is 
taken to be proportional to its Boltzmann weight: the 
probability of a spin configuration C is proportional to 
e -PH(c) ^ where — \jT is the inverse temperature. 

In equilibrium statistical mechanics, the entropy den- 
sity is a monotonic increasing function of the tempera- 
ture. Quite generically, a plot of the entropy /i M as a func- 
tion of temperature T resembles that of the top plot in 
Fig. Thus, may be viewed as a nonlinearly rescaled 
temperature. One might ask, then, why one might want 
to plot complexity versus entropy: Isn't a plot of com- 
plexity versus temperature qualitatively the same? In- 
deed, the two plots would look very similar. However, 
there are two major benefits of complexity-entropy di- 
agrams for statistical mechanical systems. First, the 
entropy captures directly the system's unpredictability, 
measured in bits per spin. The entropy thus measures 
the system's information processing properties. Second, 
plotting complexity versus entropy and not temperature 
allows for a direct comparison of the range of informa- 
tion processing properties of statistical mechanical sys- 
tems with systems for which there is not a well defined 
temperature, such as the deterministic dynamical sys- 
tems of the previous section or the cellular automata of 
the subsequent one. 
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FIG. 4: Complexity-entropy diagram for the one-dimensional, 
spin- 1/2 antiferromagnetic Ising model with nearest- and 
next-nearest-neighbor interactions. 10 5 system parameters 
were sampled randomly from the following ranges: Ji G 
[-8,0], J 2 G [-8,0], T G [0.05,6.05], and B G [0,3]. For each 
parameter setting, the excess entropy E and entropy density 
/i p were calculated analytically. 



1. One- Dimensional Ising System 

We be gin by examining one-dimensional Ising systems. 
In Refs. |25l 0, [9l[ two of the authors developed exact, 
analytic transfer-matrix methods for calculating and 
E in the thermodynamic (N — > oo) limit. These meth- 
ods make use of the fact the NNN Ising model is order-2 
Markovian. We used these methods to produce Fig. 2J 
the complexity-entropy diagram for the NNN Ising sys- 
tem with antiferromagnetic coupling constants J\ and 
J2 that tend to anti- align coupled spins. The figure 
gives a scatter plot of 10 5 (/i M ,E) pairs for system pa- 
rameters that were sampled randomly from the following 
ranges: Ji G [-8,0], J 2 G [-8,0], T G [0.05,6.05], and 
B G [0, 3]. For each parameter realization, the excess en- 
tropy E and entropy density were calculated. Fig. [4] 
is rather striking — the (ft- M ,E) pairs are organized in the 
shape of a "batcape" . Why does the plot have this form? 

Recall that if a sequence over a binary alphabet is 
periodic with period p, then E = log 2 p and = 0. 
Thus, the "tips" of the batcape at /i M = correspond 
to crystalline (periodic) spin configurations with peri- 
ods 1, 2, 3, and 4. For example, the (0,0) point is the 
period-1 configuration with all spins aligned. These peri- 
odic regimes correspond to the system's different possible 
ground states. As the entropy density increases, the cape 
tips widen and eventually join. 

Figure 0] demonstrates in graphical form that there is 
organization in the process space defined by the Hamilto- 
nian of Eq. (|22[> . Specifically, for antiferromagnetic cou- 



plings, E and values do not uniformly fill the plane. 
There are forbidden regions in the complexity-entropy 
plane. Adding randomness (h^) to the periodic ground 



states does not immediately destroy them. That is, there 
are low-entropy states that are almost-periodic. The ap- 
parent upper linear bound is that of Eq. (TT5]) for a system 
with at most 4 Markov states or, equivalently, a order-2 
Markov chain: E < 2 - 2h ll . 

In contrast, in the logistic map's complexity-entropy 
diagram (Fig. [5]) one does not see anything remotely like 
the batcape. This indicates that there are no low-entropy, 
almost-periodic configurations related to the exactly pe- 
riodic configurations generated at zero-entropy along the 
period-doubling route to chaos. Increasing the parame- 
ter there does not add randomness to a periodic orbit. 
Rather, it causes a system bifurcation to a higher-period 
orbit. 



2. Two-Dimensional Ising Model 

Thus far we have considered only one-dimensional sys- 
tems, either temporal or spatial. However, the excess 
entropy can be extended to apply to two-dimensional 
configurations as well; for details, see Ref. (9^|. Using 
methods from there, we calculated the excess entropy 
and entropy density for the two-dimensional Ising model 
with nearest- and next-nearest-neighbor interactions. In 
other words, we calculated the complexity-entropy dia- 
gram for the two-dimensional version of the system whose 
complexity-entropy diagram is shown in Fig. [4j There are 
several different definitions for the excess entropy in two 
dimensions, all of which are similar but not identical. In 
Fig. 0] we used a version that is based on the mutual 
information and, hence, is denoted Ej [92| . 

Figure [5] gives a scatter plot of 4, 500 complexity- 
entropy pairs. System parameters in Eq. (|22p were sam- 
pled randomly from the following ranges: J\ G [—3,0], 
J 2 G [-3,0], T G [0.05,4.05], and B = 0. For each pa- 
rameter setting, the excess entropy E/ and entropy den- 
sity hfi were estimated numerically; the configurations 
themselves were generated via a Monte Carlo simulation. 
For each E) point the simulation was run for 200, 000 
Monte Carlo updates per site to equilibrate. Configura- 
tion data was then taken for 20, 000 Monte Carlo up- 
dates per site. The lattice size was a square of 48 x 48 
spins. The long equilibration time is necessary because, 
for some Ising models at low temperature, single-spin flip 
dynamics of the sort used here have very long transient 

times [H Hi! . 

Note the similarity between Figs. Q] and For the 
2D model, there is also a near-linear upper bound: E < 
5(1 — hfj). In addition, one sees periodic spin configura- 
tions, as evidenced by the horizontal bands. An Ej of 
1 bit corresponds to a checkerboard of period 2; Ej = 3 
corresponds to a checkerboard of period 4; while E/ = 2 
corresponds to a "staircase" pattern of period 4. See 
Ref. 92j for illustrations. The two period-4 configura- 
tions are both ground states for the model in the param- 
eter regime in which |J 2 | < |Ji| and J 2 < 0. At low 
temperatures, the state into which the system settles is 
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FIG. 5: Complexity-entropy diagram for the two-dimensional, 
spin- 1/2 antiferromagnetic Ising model with nearest- and 
next-nearest-neighbor interactions. System parameters were 
sampled randomly from the following ranges: Ji £ [—3,0], 
J'2 e [-3, 0], T G [0.05, 4.05], and B = 0. For each parameter 
setting, the excess entropy E_r and entropy density /i M were 
estimated numerically. 



a matter of chance. 

Thus, the horizontal streaks in the low-entropy region 
of Fig. [5] are the different ground states possible for the 
system. In this regard Fig. O is qualitatively similar to 
Fig. 0] — in each there are several possible ground states at 
= that persist as the entropy density is increased. 
However, in the two-dimensional system of Fig. O one 
sees a scatter of other values around the periodic bands. 
There are even E/ values larger than 3. These E/ val- 
ues arise when parameters are selected in which the NN 
and NNN coupling strengths are similar; J\ « J 2 . When 
this is the case, there is no energy cost associated with 
a horizontal or vertical defect between the two possible 
ground states. As a result, for low temperatures the sys- 
tems effectively freezes into horizontal or vertical strips 
consisting of the different ground states. Depending on 
the number of strips and their relative widths, a number 
of different Ej values are possible, including values well 
above 3, indicating very complex spatial structure. 

Despite these differences, the similarities between 
the complexity-entropy plots for the one- and two- 
dimensional systems is clearly evident. This is all 
the more noteworthy since one- and two-dimensional 
Ising models are regarded as very different sorts of sys- 
tem by those who focus solely on phase transitions. 
The two-dimensional Ising model has a critical phase 
transition while the one-dimensional does not. And, 
more generally, two-dimensional random fields are gen- 
erally considered very different mathematical entities 
than one-dimensional sequences. Nevertheless, the two 
complexity-entropy diagrams show that, away from criti- 
cality, the one- and two-dimensional Ising systems' ranges 
of intrinsic computation are similar. 



3. Ising Model Phase Transition 

As noted above, the two-dimensional Ising model is 
well known as a canonical model of a system that un- 
dergoes a continuous phase transition — a discontinuous 
change in the system's properties as a parameter is con- 
tinuously varied. The 2D NN Ising model with ferromag- 
netic ( J\ > 0) bonds and no NNN coupling ( J 2 = 0) and 
zero external field (B = 0) undergoes a phase transition 
at T = T c k 2.269 when J x = 1. At the critical tem- 
perature T c the magnetic susceptibility diverges and the 
specific heat is not differentiable. In Fig. [5] we restricted 
ourselves to antiferromagnetic couplings and thus did not 
sample in the region of parameter space in which the 
phase transition occurs. 

What happens if we fix J\ — 1, J 2 = 0, and B = 0, 
and vary the temperature? In this case, we see that the 
complexity, as measured by E, shows a sharp maximum 
near the critical temperature T c . Figure [6] shows results 
obtained via a Monte Carlo simulation on a 100 x 100 
lattice. We used a Wolff cluster algorithm and periodic 
boundary conditions. After 10 6 Monte Carlo steps (one 
step is one proposed cluster flip), 25,000 configurations 
were sampled, with 200 Monte Carlo steps between mea- 
surements. This process was repeated for over 200 sam- 
ples between T = and T = 6. More temperatures were 
sampled near the critical region. 

In Fig. [5] we first plot entropy density and excess en- 
tropy E versus temperature. As expected, the excess en- 
tropy reaches a maximum at the critical temperature T c . 
At T c the correlations in the system decay algebraically, 
whereas they decay exponentially for all other T c values. 
Hence, E, which may be viewed as a global measure of 
correlation, is maximized at T c . For the system of Fig. [51 
T c appears to have an approximate value of 2.42. This 
is above the exact value for an infinite system, which is 
T c w 2.27. Our estimated value is higher, as one expects 
for a finite lattice. At the critical temperature, rs 0.57, 
and E « 0.413. 

Also in Fig. [5] we show the complexity-entropy diagram 
for the 2D Ising model. This complexity-entropy dia- 
gram is a single curve, instead of the scatter plots seen in 
the previous complexity-entropy diagrams. The reason is 
that we varied a single parameter, the temperature, and 
entropy is a single- valued function of the temperature, as 
can clearly be seen in the first plot in Fig. [6] Hence, there 
is only one value of for each temperature, leading to 
a single curve for the complexity-entropy diagram. 

Note that the peak in the complexity-entropy diagram 
for the 2D Ising model is rather rounded, whereas E plot- 
ted versus temperature shows a much sharper peak. The 
reason for this rounding is that the entropy density 
changes very rapidly near T c . The effect is to smooth the 
E curve when plotted against 

A similar complexity-entropy was produced by Arnold 
[75| . He also estimated the excess entropy, but did so 
by considering only one-dimensional sequences of mea- 
surements obtained at a single site, while a Monte Carlo 
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FIG. 6: Entropy rate vs. temperature, excess entropy vs. tem- 
perature, and the complexity-entropy diagram for the 2D NN 
ferromagnetic Ising model. Monte Carlo results for 200 tem- 
peratures between and 6. The temperature was sampled 
more densely near the critical temperature. For further dis- 
cussion, see text. 

simulation generated a sequence of two-dimensional con- 
figurations. Thus, those results do not account for two- 
dimensional structure but, rather, reflect properties of 
the dynamics of the particular Monte Carlo updating al- 
gorithm used. Nevertheless, the results of Ref. [75| are 
qualitatively similar to ours. 

Erb and Ay [9(| have calculated the multi-information 



for the two-dimensional Ising model as a function of tem- 
perature. The multi-information is the difference be- 
tween the entropy rate and the entropy of a single site: 
H(l) — h^. That is, the multi-information is only the 
leading term in the sum which defines the excess entropy, 
Eq. HU). (Recall that h^l) = H{1).) They find that the 
multi-information is a continuous function of the tem- 
perature and that it reaches a sharp peak at the critical 
temperature [96|, Fig. 4]. 



C. Cellular Automata 

The next process class we consider is cellular automata 
(CAs) in one and two spatial dimensions. Like spin sys- 
tems, CAs are common prototypes used to model spa- 
tially extended dynamical systems. For reviews see, e.g., 
Refs. [§3, [H, [99|. Unlike the Ising models of the previ- 
ous section, the CAs that we study here are deterministic. 
There is no noise or temperature in the system. 

The states of the CAs we shall consider consist of one- 
or two-dimensional configurations s = . . . s~ x , s , s , . . . 
of discrete K-&ry local states s l S {0, 1, . . . , K — 1}. The 
configurations change in time according to a global update 
function <&: 

4+x = H , (23) 

starting from an initial configuration Sq. What makes 
CAs cellular is that configurations evolve according to a 
local update rule. The value s\ +1 of site i at the next time 
step is a function <fi of the site's previous value and the 
values of neighboring sites within some radius r: 

4 +1 =d>(^...,4...,s^ r )- (24) 

All sites are updated synchronously. The CA update rule 
<p consists of specifying the output value st+i for all possi- 
ble neighborhood configurations r/ t — s\~ r . . . , s\ . . . , s\ +r . 
Thus, for ID radius-r CAs, there are K 2r+1 possi- 
ble neighborhood configurations and 2 K possible CA 
rules. The r = 1, K = 2 ID CAs are called elementary 
cellular automata (97j . 

In all CA simulations reported we began with an ar- 
bitrary random initial configuration Sq and iterated the 
CA several thousand times to let transient behavior die 
away. Configuration statistics were then accumulated for 
an additional period of thousands of time steps, as appro- 
priate. Periodic boundary conditions on the underlying 
lattice were used. 

In Fig. [7] we show the results of calculating various 
complexity-entropy diagrams for ID, r — 2, K = 2 (bi- 
nary) cellular automata. There are 2 2 k 4.3 x 10 9 such 
CAs. We cannot examine all 4.3 billion CAs; instead we 
sample the space these CAs uniformly. For the data of 
Fig. \7\ the lattice has 5 x 10 4 sites and a transient time 
of 5 x 10 4 iterations was used. We plot h^ versus E for 
spatial sequences. Plots for the temporal sequences are 
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FIG. 7: Spatial entropy density ht and spatial excess entropy 
E s for a random sampling of 10 r = 2, binary ID CAs. 



qualitatively similar. There are several things to observe 
in these diagrams. 

One feature to notice in Fig. [7] is that no sharp peak 
in the excess entropy appears at some intermediate 
value. In contrast, the maximum possible excess entropy 
falls off moderately rapidly with increasing h^. A lin- 
ear upper bound, E < 4(1 — /i^), is almost completely 
respected. Note that, as is the case with the other 
complexity-entropy diagrams presented here, for all 
values except = 1 , there is a range of possible excess 
entropies. 

In the early 1990's there was considerable exploration 
of the organization of CA r ule s pace . In particular, 
a series of papers [HI, llOOL llOll . Il02| looked at two- 
dimensional eight-state (K — 8) cellular automata, with 
a neighborhood size of 5 sites — the site itself and its near- 
est neighbor to the north, east, west, and south. These 
references reported evidence for the existence of a phase 
transition in the complexity-entropy diagram at a criti- 
cal entropy level. In contrast, however, here and in the 
previous sections we find no evi dence for such a transi- 
tion. The reasons that Refs. [H, GM DM [IqI] report a 
transition are two-fold. First, they used very restricted 
measures of randomness and complexity: entropy of sin- 
gle isolated sites and mutual information of neighboring 
pairs of single sites, respectively. These choices have the 
effect of projecting organization onto their complexity- 
entropy diagrams. The organization seen is largely a re- 
flection of constraints on the chosen measures, not of in- 
trinsic properties of the CAs. Second, they do not sample 
the space of CA's uniformly; rather, they parametrize the 
space of CAs and sample only by sweeping their single pa- 
rameter. This results in a sample of CA space that is very 
different from uniform and that is biased toward higher 
complexity CAs. For a further discussion of complexity- 
entropy diagrams fo r cel l ular auto mata, incl udin g a dis- 
cussion of Refs. [H, [ToO, [lOll, HQS] , see Ref. [l03t . 



D. Markov Chain Processes 



In this and the next section, we consider two classes of 
process that provide a basis of comparison for the pre- 
ceding nonlinear dynamics and statistical mechanical sys- 
tems: those generated by Markov chains and topological 
e-machines. These classes are complementary to each 
other in the following sense. Topological e-machines rep- 
resent structure in terms of which sequences (or configu- 
rations) are allowed or not. When we explore the space 
of topological e-machines, the associated processes differ 
in which sets of sequences occur and which are forbid- 
den. In contrast, when exploring Markov chains, we fix 
a set of allowed words — in the present case the full set 
of binary sequences — and then vary the probability with 
which subwords occur. These two classes thus represent 
two different types of possible organization in intrinsic 
computation — types that were mixed in the preceding 
example systems. 

In Fig. [5] we plot E versus for order-2 (4-state) 
Markov chains over a binary alphabet. Each element 
in the stochastic transition matrix T is chosen uniformly 
from the unit interval. The elements of the matrix are 
then normalized row by row so that ^2jT{j = 1. We 
generated 10 5 such matrices and formed the complexity- 
entropy diagram shown in Fig. [5] Since these processes 
are order-2 Markov chains, the bound of Eq. (TT5|) ap- 
plies. This bound is the sharp, linear upper limit evident 
in Fig. E E = 2 - 2h ill . 

It is illustrative to compare the 4-state Markov chains 
considered here with the ID NNN Ising models of Sec. 
IIIIB 11 The order-2 (or 4-state Markov) chains with a 
binary alphabet are those systems for which the value of 
a site depends on the previous two sites, but no others. 
In terms of spin systems, then, this is a spin-1/2 (i.e., bi- 
nary) system with nearest- and next-nearest neighbors. 
The transition matrix for the Markov chain is 4 x 4 and 
thus has 16 elements. However, since each row of the 
transition matrix must be normalized, there are 12 in- 
dependent parameters for this model class. In contrast, 
there are only 3 independent parameters for the ID NNN 
Ising chain — the parameters J±, J2, B, and the tempera- 
ture T. One of the parameters may be viewed as setting 
an energy scale, so only three are independent. 

Thus, the ID NNN systems are a proper subset of 
the 4-state Markov chains. Note that their complexity- 
entropy diagrams are very different, as a quick glance at 
Figs. 2] and [5] confirms. The reason for this is that the 
Ising model, due to its parametrization (via the Hamil- 
tonian of Eq. (f2"2")l ). samples the space of processes in a 
very different way than the Markov chains. This under- 
scores the crucial role played by the choice of model and, 
so too, the choice in parametrizing a model space. Differ- 
ent parametrizations of the same model class, when sam- 
pled uniformly over those parameters, yield complexity- 
entropy diagrams with different structural properties. 
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FIG. 8: Excess-entropy, entropy-rate pairs for 10 s randomly 
selected 4-state Markov chains. 

E. The Space of Processes: Topological e-Machines 

The preceding model classes are familiar from dynam- 
ical systems theory, statistical mechanics, and stochastic 
process theory. Each has served an historical purpose in 
their respective fields — purposes that reflect mathemat- 
ically, physically, or statistically useful parametrizations 
of the space of processes. In the preceding sections we 
explored these classes, asked what sort of processes they 
could generate, and then calculated complexity-entropy 
pairs for each process to reveal the range of possible in- 
formation processing within each class. 

Is there a way, though, to directly explore the space 
of processes, without assuming a particular model class 
or parametrization? Can each process be taken at face 
value and tell us how it is structured? More to the point, 
can we avoid making structural assumptions, as done in 
the preceding sections? 

Affirmative answers to these questions are found 
in the approach laid out by computational mechanics 
[3 Gil l2q |- Computational mechanics demonstrates 
that each process has an optimal, minimal, and unique 
representation — the e-machine — that captures the pro- 
cess's structure. Due to optimality, minimality, and 
uniqueness, the e-machine may be viewed as the represen- 
tation of its associated process. In this sense, this repre- 
sentation is parameter free. To determine an e-machine 
for a process one calculates a set of causal states and 
their transitions. In other words, one does not specify 
a priori the number of states or the transition structure 
between them. Determining the e-machine makes such 
no structural assumptions [lH, [28| . 

Using the one-to-one relationship between processes 
and their e-machines, here we invert the preceding logic 
of going from a process to its e-machine. We explore 
the space of processes by systematically enumerating e- 
machines and then calculating their excess entropies E 
and their entropy rates h^. This gives a direct view of 
how intrinsic computation is organized in the space of 
processes. 

As a complement to the Markov chain exploration of 
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Causal States 


Topological 


n 


e-machines 


1 


3 


2 


7 


3 


78 


4 


1,388 


5 


35,186 



TABLE I: The number of topological binary e-machines up 
to n — 5 causal states. (After Ref. [l05l |.) 

how intrinsic computation depends on transition prob- 
ability variation, here we examine how an e-machine's 
structure (states and their connectivity) affects informa- 
tion processing. We do this by restricting attention to 
the class of topological e-machines whose branching tran- 
sition probabilities are fair (equally probable). (An ex- 
ample is shown in Fig. 1101 ) 

If we regard two e-machines isomorphic up to variation 
in transition probabilities as members of a single equiv- 
alence class, then each such class of e-machines contains 
prec isely one topological e-machine. (Symbolic dynamics 
|104| | refers to a related class of representations as topolog- 
ical Markov chains. An essential, and important, differ- 
ence is that e-machines always have the smallest number 
of states.) 

It turns out that the topological e-machines with a fi- 
nite number of states can be systematically enumerated 
|l05j |. Here we consider only e-machines for binary pro- 
cesses: A = {0,1}. Two e-machines are isomorphic and 
generate essentially the same stochastic process, if they 
are related by a relabeling of states or if their output sym- 
bols are exchanged: is mapped to 1 and vice versa. The 
number of isomorphically distinct topological e-machines 
of n = 1, . . . , 5 states is listed in Table |U 

In Fig. [9] we plot their (/i M ,E) pairs. There one sees 
that the complexity-entropy diagram exhibits quite a bit 
of organization, with variations from very low to very 
high density of e-machines co-existing with several dis- 
tinct vertical (iso-entropy) families. To better under- 
stand the structure in the complexity-entropy diagram, 
though, it is helpful to consider bounds on the complex- 
ities and entropies of Fig. [HI The minimum complexity, 
E = 0, corresponds to machines with only a single state. 
There are two possibilities for such binary e-machines. 
Either they generate all Is (or 0s) or all sequences oc- 
curring with equal probability (at each length). If the 
latter, then — 1; if the former, — 0. These two 
points, (0,0) and (1,0), are denoted with solid circles 
along Fig. [His horizontal axis. 

The maximum E in the complexity-entropy diagram 
is log 2 5 w 2.3219. One such e-machine corresponds to 
the zero-entropy, period-5 processes. And there are four 
similar processes with periods p = 1, 2, 3, 4 at the points 
(0, log 2 p). These are denoted on the figure by the tokens 
along the left vertical axis. 

There are other period-5 cyclic, partially random pro- 
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FIG. 9: Complexity-entropy pairs (hp, E) for all topological binary e-machines with n = 1, . . . , 4 states and for 35, 041 of the 
35, 186 5-state e-machines. The excess entropy i s est imated as E(L) = H(L) — L/t M using the exact value for the entropy rate 
ft M and a storage-efficient type-class algorithm |l06l | for the block entropy H(L). The estimates were made by increasing L 
until E(L) - E(L - 1) < 8, where 5 = 0.0001 for 1, 2, and 3 states; 5 = 0.0050 for 4 states; and 5 = 0.0100 for 5 states. 




FIG. 10: An example topological e-machine for a cyclic pro- 
cess in JF 5i 3. Note that branching occurs only between pairs 
of successive states in the cyclic chain. The excess entropy 
for this process is log 2 5 « 2.32, and the entropy rate is 3/5. 



cesses with maximal complexity, though; those with 
causal states in a cyclic chain. These have b = 1,2,3,4 
branching transitions between successive states in the 
chain and so positive entropy. These appear as a hor- 
izontal line of enlarged square tokens along in the upper 
portion of the complexity-entropy diagram. Denote the 
family of p-cyclic processes with b branchings as T v ^- 
An e-machine illustrating .7-5.3 is shown in Fig. 1101 The 



excess entropy for this process is log 2 5 « 2.32, and the 
entropy rate is 3/5. 

Since e-machines for cyclic processes consist of states 
in a single loop, their excess entropies provide an upper 
bound among e-machines that generate p-cyclic processes 
with b branchings states, namely: 

E(^ 6 ) = log 2 (p) . (25) 

Clearly, E(J r Pi b) —>■ 00 as p —>■ 00. Their entropy rates 
are given by a similarly simple expression: 

M^P.fc) = - ■ ( 26 ) 

Note that h^(J- p ^) — > as p — > 00 with fixed b and 
h^Tp^) — > 1 as b — > p. Together, then, the family Tf,f, 
gives an upper bound to the complexity-entropy diagram. 

The processes T v .h are representatives of the high- 
est points of the prominent jutting vertical towers of e- 
machines so prevalent in Fig. [5J It therefore seems rea- 
sonable to expect the E) coordinates for p-cyclic pro- 
cess languages to possess at least p — 1 vertical towers, 
distributed evenly at = b/p, b = 1, . ..,p — 1, and 
for these towers to correspond with towers of m-cyclic 
process languages whenever m is a multiple of p. 
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These upper bounds are one key difference from ear- 
lier classes in which there was a decreasing linear up- 
per bound on complexity as a function of entropy rate: 
E < R(l — hfj). That is, in the space of processes, 
many are not so constrained. The subspace of topologi- 
cal e-machines illustrates that there are many highly en- 
tropic, highly structured processes. Some of the more 
familiar model classes appear to inherit, in their implied 
parametrization of process space, a bias away from such 
processes. 

It is easy to see that the families J- PtP -\ and T Pt \ pro- 
vide upper and lower bounds for h^, respectively, among 
the process languages that achieve maximal E and for 
which > 0. Indeed, the smallest positive /i M possible 
is achieved when only a single of the equally probable 
states has more than one outgoing transition. 

More can be said about this picture of the space of 
intri nsic computation spanned by topological e-machines 
[l05j . Here, however, our aim is to illustrate how rich 
the diversity of intrinsic computation can be and to do 
so independent of conventional model-class parametriza- 
tions. These results allow us to probe in a systematic 
way a subset of processes in which structure dominates. 



IV. DISCUSSION AND CONCLUSION 

Complexity-entropy diagrams provide a common view 
of the intrinsic information processing embedded in dif- 
ferent processes. We used them to compare markedly 
different systems: one-dimensional maps of the unit in- 
terval; one- and two-dimensional Ising models; cellular 
automata; Markov chains; and topological e-machines. 
The exploration of each class turned different knobs in 
the sense that we adjusted different parameters: temper- 
ature, nonlinearity, coupling strength, cellular automa- 
ton rule, and transition probabilities. Moreover, these 
parameters had very different effects. Changing the tem- 
perature and coupling constants in the Ising models al- 
tered the probabilities of configurations, but it did not 
change which configurations were allowed to occur. In 
contrast, the topological e-machines exactly expressed 
what it means for different processes to have different 
sets of allowed sequences. Changing the CA rules or 
the nonlinearity parameter in the logistic map combined 
these effects: the allowed sequences or the probability 
of sequences or both changed. In this way, the sur- 
vey illustrated in dramatic fashion one of the benefits of 
the complexity-entropy diagram: it allows for a common 
comparison across rather varied systems. 

For example, the complexity-entropy diagram for the 
radius-2, one-dimensional cellular automata, shown in 
Fig. is very different from that of the logistic map, 
shown in Fig. For the logistic map, there is a distinct 
lower bound for the excess entropy as a function of the 
entropy rate. In Fig. [2] this is seen as the large forbidden 
region at the diagram's lower portion. In sharp contrast, 
in Fig. [7] no such forbidden region is seen. 



At a more general level of comparison, the survey 
showed that for a given h^, the excess entropy E can be 
arbitrarily small. This suggests that the intrinsic com- 
putation of cellular automata and the logistic map are 
organized in fundamentally different ways. In turn, the 
ID and 2D Ising systems exhibit yet another kind of in- 
formation processing capability. Each of has well defined 
ground states — seen as the zero-entropy tips of the "bat- 
capes" in Figs. H] and OH These ground states are ro- 
bust under small amounts of noise — i.e., as the tempera- 
ture increases from zero. Thus, there are almost-periodic 
configurations at low entropy. In contrast, there do not 
appear to be any almost-periodic configurations at low 
entropy for the logistic map of Fig. O 

Our last example, topological e-machines, was a rather 
different kind of model class. In fact, we argued that it 
gave a direct view into the very structure of the space 
of processes. In this sense, the complexity-entropy dia- 
gram was parameter free. Note, however, that by choos- 
ing all branching probabilities to be fair, wc intention- 
ally biased this model class toward high-complexity, high- 
entropy processes. Nevertheless, the distinction between 
the topological e-machine complexity-entropy diagram of 
Fig. [5] and the others is striking. 

The diversity of possible complexity-entropy diagrams 
points to their utility as a way to compare information 
processing across different classes. Complexity-entropy 
diagrams can be empirically calculated from observed 
configurations themselves. The organization reflected in 
the complexity-entropy diagram then provides clues as to 
an appropriate model class to use for the system at hand. 
For example, if one found a complexity-entropy diagram 
with a batcape structure like that of Figs. [4] and [5l this 
suggests that the class could be well modeled using en- 
ergies that, in turn, were expressed via a Hamiltonian. 
Complexity-entropy diagrams may also be of use in clas- 
sifying behavior within a model class. For example, as 
noted above, a type of complexity-entropy diagram has 
already been successfully used to distinguish between dif- 
ferent types of structure in anatomical MRI images of 
brains [38l[57|. 

Ultimately, the main conclusion to draw from this sur- 
vey is that there is a large diversity of complexity-entropy 
diagrams. There is certainly not a universal complexity- 
entropy curve, as once hoped. Nor is it the case that 
there are even qualitative similarities among complexity- 
entropy diagrams. They capture distinctive structure 
in the intrinsic information processing capabilities of a 
class of processes. This diversity is not a negative result. 
Rather, it indicates the utility of this type of intrinsic 
computation analysis, and it optimistically points to the 
richness of information processing available in the math- 
ematical and natural worlds. Simply put, information 
processing is too complex to be simply universal. 
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